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1*  Introduction  j  /  /  | 

This  paper  will  be  concerned  with  techniques  for  treating  a 
discrete,  finite  Markov  chain  whose  matrix  of  transition  probabilities 
can,  after  a  suitable  renunbering  of  the  states,  be  written  in  the  form 


(1.1) 


A11  E12  *•*  EU 


E21  A22  *•*  E2l 


E  • • •  An„ 

£1  12  I U. 


where  the  matrices  are  small.  The  matrix  A  is  nonnegative  and 

stochastic;  i.e. 


A1  -  1  , 

so  that  the  vector  1  consisting  of  all  ones  is  a  right  eigenvector 
of  A  corresponding  to  the  eigenvalue  one.  If,  in  addition,  A  is 
irreducible*,  the  eigenvalue  one  is  simple  and  there  is  a  unique,  nor¬ 
malized,  positive  left  eigenvector  y  corresponding  to  the  eigenvalue 

one  (in  the  irreducible  case  we  shall  call  the  eigenvalue  the  Perone 

T 

root).  If  A  is  acyclic  and  y  is  normalized  so  that  1  y  ■  1,  then 
y  is  the  vector  of  steady  state  probabilities  for  the  chain.  One  of 


*  For  terminology,  see  [14]. 
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the  chief  computational  problems  associated  with  Markov  chains  is  the 
determination  of  the  vector  y. 

Chains  with  transition  matrices  of  the  form  (1.1)  are  said  to  be 
nearly  completely  decomposible.  They  arise  naturally  as  models  of  systems 
whose  states  can  be  clustered  into  aggregates  that  are  loosely  connected 
to  one  another.  They  were  first  studied  by  Simon  and  Ando  [7],  who  had 
applications  to  economic  systems  in  mind.  A  recent  monograph  by  Courtois 
[3]  contains  a  history  of  the  subject  and  extensive  applications  in  the 
computer  sciences. 

The  usual  computational  procedure  goes  as  follows.  The  off-diagonal 
blocks  are  amalgamated  into  the  diagonal  blocks  A^  to  produce 

a  block  diagonal  approximation  A*  to  A  that  has  the  form 

A*  =  diagCA^,  A*2 . A*£)  . 

This  decomposition  is  Hone  in  such  a  way  that  each  block  A*,  is 
stochastic  and  irreducible.  The  steady  state  vectors  y^  of  the  A*^ 

are  then  computed  and  the  steady  state  vector  of  the  original  system 
approximated  in  the  form 

(1.2) 


The  quantities  are  calculated  as  the  components  of  an  eigenvector 

of  a  matrix  of  order  l  whose  elements  may  be  easily  calculated  from 


vf 


V2y2 


the  vectors  y*  and  the  original  matrix  A..  The  computational  advan¬ 
tages  of  this  method  are  obvious,  since  it  reduces  the  solution  of  a 
large  eigenvalue  problem  to  that  of  several  potentially  much  smaller 
ones. 

The  purpose  of  this  paper  is  to  resolve  two  difficulties  with  the 
method  as  it  is  currently  practiced.  The  first  concerns  the  determi¬ 
nation  of  the  approximating  decomposed  matrix  A*,  a  process  frequently 
refered  to  by  the  unfortunate  term  "aggregation".*  There  are  in¬ 
finitely  many  ways  to  incorporate  the  off-diagonal  blocks  of  A  into 
the  diagonal  blocks  in  order  to  get  an  approximation  A*.  In  some 
instances  this  flexibility  may  be  useful.  For  example  [13],  in  certain 
highly  structured  systems  it  is  possible  to  determine  the  diagonal  blocks 
A*^  so  that  the  eigenvectors  y*  are  exactly  proportional  to  the  cor¬ 
responding  pieces  of  y  [cf.  (1.2)].  In  general,  however,  the  indetermi- 
nacy  of  A  is  a  nuisance;  some  choices  of  A  may  be  better  than 
others,  but  without  further  information  there  is  no  way  of  knowing.  In 
particular,  the  derivation  of  any  general  error  bound  for  the  approxi¬ 
mation  (1.2)  must  necessarily  entail  the  assumption  that  the  worst 
choice  has  been  made.  In  this  paper  a  new  method  is  proposed  that  does 
not  require  intermediate  approximations  but  works  directly  with  the  ori¬ 
ginal  matrix  A. 

The  second  problem  treated  here  is  that  of  showing  how  reasonably 
sharp  error  bounds  may  be  computed.  Courtois  [2,  3]  has  given  an  error 
analysis  for  the  procedure  sketched  above,  which  shows  in  part  how  it 

*  By  all  natural  usage,  "aggregation"  should  refer  to  the  determination 
of  which  states  are  to  be  clustered  together. 
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behaves  as  the  off-diagonal  blocks  become  small.  However,  the  analysis 
is  not  suitable  for  computing  error  bounds  for  two  reasons.  First,  the 
analysis  is  asymptotic  in  the  size  of  the  off-diagonal  blocks,  and  it  is 
not  shown  how  small  the  blocks  must  be  for  it  to  be  approximately  correct. 
Second,  the  analysis  assumes  that  all  the  matrices  involved  have  complete 
systems  of  eigenvectors.  Although  it  is  unlikely  that  any  given  problem 
will  fail  to  have  this  property,  it  is  not  at  all  unlikely  that  it  will 
be  near  a  problem  that  does,  in  which  case  an  analysis  based  on  eigen¬ 
vector  expansions  will  give  unrealistic  results  owing  to  the  ill  condi¬ 
tion  of  the  matrix  of  normalized  eigenvectors. 

The  techniques  developed  in  this  paper  are  not  restricted  to  sto¬ 
chastic  matrices;  rather  they  can  be  applied  to  find  the  dominant  eigen¬ 
value  of  almost  any  matrix  of  the  form  (1.1).  What  is  required  is  that 
the  dominant  eigenvalues  of  the  be  simple  and  have  sufficiently 

we 11- conditioned  eigenvectors  and  that  the  be  sufficiently  small. 

If  A  is  stochastic,  these  conditions  are  likely  to  be  satisfied;  but 
as  will  be  seen,  the  computational  techniques  test  the  conditions  directly, 
without  reference  to  the  properties  of  A.  In  particular,  if  one  of  the 
A^  has  the  form  (1.1),  its  dominant  eigenvectors  can  be  found  independ¬ 
ently  of  A  by  the  method  described  in  this  paper.  This  observations  has 
important  consequences  for  the  process  of  multi-level  aggregation  described 
by  Court ois  [3]. 

This  paper  is  organized  as  follows.  Sections  2  and  3  lay  the  theo¬ 
retical  foundations  for  the  techniques  to  follow;  Section  2  describes  the 
deflation  of  a  simple  eigenvalue,  and  Section  3  reviews  perturbation  theory 
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for  invariant  subspaces.  In  Section  4  the  technique  is  sketched  broadly, 
and  in  Section  5  it  is  Justified  in  detail  by  the  derivation  of  effectively 
computable  error  bounds.  In  Section  6  the  practical  techniques  from  nu¬ 
merical  analysis  required  to  implement  the  method  are  discussed.  The  paper 
concludes  with  a  numerical  example. 

Many  of  the  results  of  this  paper  will  be  cast  in  terms  of  vector 
and  matrix  norms.  The  symbol  fl •  f  will  denote  either  the  Euclidean  vec¬ 
tor  norm  defined  by 

||x  H2  =•  xTx 

or  the  spectral  matrix  norm  defined  by 

||  A  |[  -  max  ||  Ax  || 

II  x  ||  -1 

The  symbol  ||  •  ||  ^  will  denote  the  Frobenius  matrix  norm  defined  by 

I*  l|  -  l  a l 

i»j  3 

Note  that  for  any  vector  x, 


For  more  on  these  norms  see  [8]. 


It  is  important  not  to  expect  too  much  of  error  bounds  cast  in  terms 


of  norms.  In  the  first  place,  repeated  use  of  inequalities  such  as  the 
triangle  inequality  tends  to  make  them  pessimistic.  In  the  second  place, 
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such  a  bound  can  be  difficult  to  interpret  in  terms  of  the  components 
of  the  vector  thus  bounded.  For  example,  if  ||e  ||  <  e,  then  any  com¬ 
ponent  of  e  can  be  as  large  as  e.  But  other  things  being  equal,  it 
is  more  likely  that  each  component  is  of  order  e//n  . 

In  cases  where  an  error  bound  is  unsatisfactory,  it  may  be  necessary 
to  calculate  an  error  estimate,  in  which  an  attempt  is  made  to  approximate 
the  error  vector  itself.  For  many  problems  this  can  be  done,  although 
frequently  a  heavy  computational  price  must  be  paid.  Moreover,  once  an 
error  estimate  has  been  calculated,  it  is  hard  to  resist  the  temptation 
to  use  it  to  improve  the  putative  solution,  which  will  set  off  another 
round  of  error  bounding.  To  feel  the  force  of  this  temptation,  the  reader 
is  invited  to  consider  the  table  of  estimates  given  on  page  187  of  [3]. 

2 .  The  constructive  theory  of  a_  simple  eigenvalue. 

In  this  section  are  collected  a  number  of  results  about  a  simple  eigen¬ 
value  and  its  eigenvectors  which  can  be  found  in  one  form  or  another  scat¬ 
tered  throughout  the  literature.  The  results  follow  from  a  constructive 
reduction  of  A  to  block  diagonal  form  by  means  of  rather  simple  similarity 
t  rans  formations . 

Let  A  be  a  matrix  of  order  n  with  a  real  simple  eigenvalue  B  cor¬ 
responding  to  the  eigenvector  x.  Since  x  is  nonzero,  it  may  be  norma¬ 
lized  so  that  |jx  H  “1.  Let  the  columns  of  nx(n-l)  matrix  Y  form  an 
orthonormal  basis  for  the  subspace  orthogonal  to  x;  i.e. 

(2.1)  YTY  *  I 


This  implies  that  the  matrix  (x  Y)  is  orthogonal. 
Consider  the  similarity  transformation 


(2.3) 


(x  Y)1  A  (x  Y) 


ta 

x  Ax 

T 

x  AY 

T 

Y  Ax 

T 

YaAY 

T 

fix  X 

T 

x  AY 

T 

BY  x 

T 

Y  AY 

It  follows  from  (2.2)  and  the  fact  that  x  x  =  1  that 

L  Tl 


(x  Y)1  A  (x  Y) 


8  g 

0  C 


where 


T  Tav 

g  *  x  AY  , 


C  -  Y  AY 


The  matrix  c  has  for  its  eigenvalues  the  eigenvalues  of  A  other  than 
0,  hence  C  -  01  is  nonsingular. 

Consider  now  the  further  similarity  transformation 


[l  -.I  f. 


8  gTl  \i 


[0  0qT  +  gT  -  qTc| 


'  -  I  ' 
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Since  C  -  81  is  nonsingular,  q  may  be  chosen  to  satisfy 

(C  -  6I)Tq  -  g  , 


from  which  it  follows  that  the  row  vector  in  the  upper  right  of  (2.4) 
is  zero.  Thus  the  two  similarity  transformations  (2.3)  and  (2.4) 
reduce  A  to  the  block  diagonal  form  diag(B,  B) . 

The  composite  similarity  transformation  that  reduces  A  can  be 
found  by  multiplying  the  two  transformations  (2.3)  and  (2.4).  Spe¬ 
cifically,  set 


(x  X) 


(x  Y) 


1 

0 


T 

q 

i 


(x  Y  +  xqT) 


and 


(2.5) 


(y  Y) 


(x  Y) 


1 

-q 


o 

i 


(X  -  Yq  Y) 


Then 


(2.6) 


(x  X) 


f 

y 

Y 


and 


(2.7) 


T 1 

y 

T  „ 

y  Ax  0 

6  0 

A  (x  X)  = 

T 

* 

0  YAX 

0  C 

A  number  of  important  facts  can  be  read  from  this  reduction.  In  the 

first  place  y  is  the  left  eigenvector  of  A  corresponding  to  8. 

T 

Since  from  (2.5),  y  x  =  1,  it  follows  that  y  is  not  orthogonal  to  x. 


-  9  - 


Since  from  (2.5) 


y  -  x  -  Yq 


an  alternate  expression  for  q  follows  from  (2.1)  and  (2.2) 


Moreover, 


q  =  -  Y  y 


Similarly, 


;|y  II2  -  i  +  II q 


|x  II2  =  i  +  I q  I2 


All  these  results  may  be  summarised  in  the  following  theorem. 


Theorem  2.1.  Let  8  be  a  simple  eigenvalue  of  a  matrix  A  of 
order  n,  and  let  the  corresponding  eigenvector  x  be  normalized  so  that 


1.  *  =  1 


Let  y  be  the  left  eigenvector  corresponding  to  8.  Then  y  is  not 
orthogonal  to  x  and  may  be  normalized  so  that 


T  i 

y  x  =  1 


Moreover  there  are  nx(n-l)  matrices  X  and  Y  such  that 


T  T 
Y  Y  =  Y  X  =  I 
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where 


T  T 

4.  y  X  «  x  Y  -  0 


5.  (x  X)"1  -  (y  Y)T 


6. 


(y  Y)T  A  (x  X) 


6  0 

0  C 


7.  C  =  YTAX  -  YTAY 

The  eigenvalues  of  the  matrix  C  are  the  eigenvalues  of  A 
If  q  is  defined  by  either  of  the  expressions 


8. 

9. 

then 

10. 

11. 

12. 


q  =  (C  -  8I)~TYTATx  , 
q  =  -YTy 

X  =  Y  +  xqT 
y  =  x  -  Yq 

lUII2  -  b  l2  ■  1  +  |q  I2 


3.  Perturbation  theory 

In  this  section  the  following  problem  will  be  addressed 
matrix  A  partitioned  in  the  form 

B  G 

A  - 

H  C 

J 


other  than  6. 


:  given  a 
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find  a  matrix  U  as  near  as  possible  to  the  identity  such  that  the  trans¬ 
formed  matrix  A  *  UAU  ^  has  the  form 

BO 
A  *  __ 

H  C 

The  importance  of  this  problem  lies  in  the  following  observation.  If  v 

-  T 

is  a  left  eigenvector  of  B  ,  then  (v  ,  0)  will  be  a  left  eigenvector 
-  T 

of  A  ,  and  (v  ,  0)U  will  be  a  left  eigenvector  of  A.  Since 

£  II  X  -  U  ||  , 

T 

the  vector  (v  ,  0)  will  be  a  good  approximate  left  eigenvector  of  A 
in  proportion  as  U  is  near  the  identity  matrix. 

This  problem  has  been  treated  in  [9],  and  the  following  is  a  sum¬ 
mary  of  the  results  required  in  this  paper.  The  reader  is  referred  to 
the  reference  for  proofs. 

The  problem  will  have  a  solution  only  if  the  eigenvalues  of  B  and 
C  are  separated  and  G  is  sufficiently  small.  Unfortunately,  the  minimum 
of  the  distances  between  the  eigenvalues  of  B  and  C  is  too  crude  a 
measure  of  separation  to  give  satisfactory  bounds.  Instead  the  measure 

6(B,  C)  *  inf  JJ BP  -  PC  |L 

l|p||F-i  r 

will  be  used.  The  properties  of  6(B,  C)  are  summarized  in  the  following 


theorem. 


-  12  - 


Theorem  3.1.  The  number  3  is  zero  If  and  only  If  B  and  C 
have  an  eigenvalue  In  common.  Moreover, 


1. 


6(B,  C)  <  min 


Y 


3  an  eigenvalue  of 
Y  an  eigenvalue  of 


b! 

C  »  ’ 


2.  5  (B+E,  C+F)  >  6  (B ,  C)  -  |E  ||  -  |F  | 


3 .  5  [diag(B^ ,  .  • . ,  B^)  *  ^lag(C^ ,  «-• . ,  C^)  3  "  min{  6  (B^ ,  )  • 

1*1,  ■>.,  PJ  j**l»  •••»  <1^  > 

4.  6(6,  C)  -  ||  (31  -  C)"1  fl-1  . 

Properties  2,  3,  and  4  in  the  theorem  are  particularly  important 
in  computational  practice.  Property  2  says  that  small  changes  in  B  and 
says  C  make  equally  small  changes  in  6(B,  C) ,  a  property  not  shared 
by  the  minimum  distance  between  the  eigenvalues  of  B  and  C.  Property  3 
shows  how  6  for  a  block  diagonal  matrix  can  be  found  from  the  6's 
between  the  blocks.  Finally,  property  4  gives  an  explicit  expression  for 
6  when  one  of  the  matrices  is  a  scalar.  These  properties  will  be  used 
extensively  in  the  derivation  of  the  error  bounds  in  §5. 

The  solution  to  the  problem  posed  at  the  beginning  of  this  section 
requires  that  U  be  chosen  in  a  specific  form.  Specifically,  U  will 
be  written 


I  -p 

(I  +  PPT)_1/2  0 

T 

lp  'j 

0  (I  +  ptp)~1/2 

T  -1  /  2 

Here  (I  +  PP  )  is  the  inverse  of  the  unique  positive  definite  square 


* 
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T  T  -1/2 

root  of  the  positive  definite  matrix  I  +  PP  —  similarly  for  (I  +  P  P) 

It  is  easily  verified  that  U  is  orthogonal;  i.e.  U^U  -I  or  U  ^  -  UT. 
Thus  the  problem  becomes  that  of  determining  P  so  that 

(3.2)  U1 

Conditions  under  which  this  can  be  done  are  contained  in  the  following 
theorem. 

Theorem  3.2.  In  the  notation  introduced  above,  let 


■ 

■  _ 

B  G 

B  0 

U  - 

H  C 

H  C 

(3.3) 


If 


Y  >  II G  II  p  ,  n  >  II H  S  F  , 

6  <  6(B,  C)  . 


(3.4) 


-r  =  2LX  <  I 

62  4  ’ 


then  there  exists  a  unique  matrix  P  satisfying 


(3.5) 


|p|f,x-1^E*  =  <2x 


1  -  2t  +  /1-4t  6 

such  that  U  defined  by  (3.1)  satisfies  (3.2).  Moreover, 
(3.6)  B  -  (I  +  PPT)~1/2  (B  +  PH)  (I  +  PPT)1/2  . 


The  distance  of  U  from  the  identity  matrix  is  roughly  ||P 


t 


p.lUr-r  » 
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The  bound  (3.5)  shows  that  this  distance  depends  linearly  on  || G  ||  ^ 

and  inversely  on  6(B,  C) .  In  other  words,  the  bound  becomes  smaller  as 

G  becomes  small  and  larger  as  the  spectra  of  B  and  C  approach  one 

another;  however,  the  bounds  can  be  considerably  worse  than  a  naive 

inspection  of  the  spectra  would  indicate  (cf.  1  in  Theorem  3.1). 

The  expression  for  B  is  particularly  interesting.  When  |[ H  ||  » 

0(  || G  ||),  as  it  will  in  §5,  both  PPT  and  PH  are  0(  ||g  ||^)  and  as 

G  approaches  zero  the  eigenvalues  of  B  approximate  those  at  the  ori- 

2 

ginal  matrix  up  to  terms  of  0 (  |  G  ||  ) .  This  quadratic  behavior  will 
prove  critical  in  deriving  workable  error  bounds. 


4.  The  approximation  algorithm 

In  this  section  an  algorithm  for  approximating  the  dominant  eigen¬ 


vector  of  a  matrix  of  the  form  (1.1)  will  be  described.  It  is  assumed 
that  the  diagonal  elements  A^  are  all  irreducible.  In  order  to  keep 
the  exposition  simple,  the  algorithm  will  be  described  for  a  3x3  parti¬ 
tioning,  i.e.  fl.  *  3.  The  general  case  is  a  trivial  extension. 

For  each  i,  let  S.  >  0  be  the  Perone  root  of  A.,  and  let  x.  >  0 

l  ii  i 

be  its  corresponding  right  eigenvector.  Since  6^  is  simple,  A^  has  a 
decomposition  of  the  form  described  in  Theorem  2.1,  viz. 


(yif  Y^1  A  (Xi,  XA) 


It  then  follows  from  Theorem  2.1  that  the  inverse  of  the  matrix 
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X  - 


is 


Y 


T 


Consider  the  matrix 


(4.1)  YTAX  = 


X1 

0 

0 

X1 

0 

0 

J 

0 

X2 

0 

0 

X2 

0 

0 

0 

x3 

0 

0 

X3. 

r  t 

yi 

0 

0 

0 

T 

y2 

0 

T 

0 

0 

y3 

T 

Yl 

0 

0 

• 

0 

T 

Y2 

0 

) 

0 

i. 

0 

yt 

3 

r— 1 

ca 

*12 

*13 

0 

T 

812 

T 

gl3 

1 

32 

*23 

T 

g2i 

0 

T 

g22 

i 

*31 

*32 

S3 

T 

S31 

T 

g32 

0 

'  3 

G 

■j 

! 

0 

h12 

h13 

C1 

F12 

F13 

H 

C 

>  ] 

1 

h21 

0 

h23 

F21 

C2 

F23 

l 

/> 

I 

h31 

h32 

0 

F31 

F32 

C3 

a 

1 

i 

where 
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'ij  yi  “ij  * 


T  T 

8lj  “  yi  Eij  Xj 


hij  "  Yi  Eij  Xj  * 


Lij 


Eid  xd 


Because  yi  >  0  and  x^  >  0,  it  follows  that  $  »  0  if  and  only  if 

E^  *  0.  Hence,  B  is  irreducible,  since  A  is.  Let  0  be  the  Perone 
root  of  B  with  right  eigenvector  u  and  left  eigenvector 


<V  V  v3> 


The  approximation  to  the  left  eigenvector  y  of  A  is  then  given  by 


(4.2) 


vlyl 


v2y2 


!  v„y 


y  3 


This  algorithm  is  extremely  simple.  All  that  it  requires  is  the 
calculation  of  the  left  and  right  Perone  vectors  of  the  diagonal  blocks 
of  A,  the  formation  of  the  matrix  B  from  the  ,  and  the  calculation 
of  the  left  Perone  vector  of  B.  Except  for  the  initial  grouping  of  states 
to  get  the  partition  (1.1),  the  process  is  entirely  deterministic,  re¬ 
quiring  no  assimilation  of  the  matrices  into  the  diagonal  blocks 
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5.  Error  bounds 

In  this  section  error  bounds  for  the  approximation  (4.2)  will  be 
derived.  The  bounds  provide  a  formal  proof  of  convergence  of  the  algo¬ 
rithm,  as  well  as  considerable  insight  into  its  behavior.  The  practical 
computation  of  the  bounds  will  be  discussed  in  the  next  section. 

The  approach  is  to  use  Theorem  3.2  to  obtain  an  exact  expression  for 
y  in  terms  of  a  vector  v  that  is  the  left  eigenvector  of  a  matrix  B 
lying  very  near  to  B.  A  second  application  of  Theorem  3.2  bounds 
Jv  -  v  ||  ,  and  hence  the  error  in  the  approximation  (4.2)  to  y. 

For  the  first  step,  the  notation  of  Theorem  3.2  coincides  exactly 
with  the  notation  of  the  approximation  algorithm.  Consequently  if  (3.2) 

is  satisfied,  there  is  a  matrix  U  of  the  form  (3.1)  that  reduces 

T  — 

Y  AX  to  the  form  (3.2).  Now  the  eigenvalues  of  B  are  eigenvalues  of  A. 

Let  v  be  the  left  eigenvector  of  B  corresponding  to  the  Perone  root 

of  A.  Then 

(5.1)  yT  -  (vT,  0)uTYT«  v^I  +  PPT)_1/2  (I,  P  )YT 

The  relation  between  V  and  v  must  now  be  considered.  Let 
(5-2)  IIP  1F  <  * 

be  any  bound,  presumably  obtained  by  an  application  of  Theorem  3.2.  As 
in  the  first  part  of  the  development  of  §2,  extend  v  to  an  orthogonal 
matrix  (v,  V)  such  that 

(5.3)  (v,  V)T  B  (v,  V)  - 
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and  let 


«p  <  5(P,  R)  . 


Let 


(v,  V)T  (B  +  PH)  (v,  V) 


By  Theorem  3.1 


s 

R 


6  (p  ,  R)  2.  6  -  2irr\ 

P 


Moreover, 

(5.4) 
and 

(5.5) 

Hence  by  Theorem  3.2  If 


||  r  |  <  \  r  ||  +  irn 

|s  |  i  in 


wti(  1  r  II  +  irn)  <  1. 

(«p  -  2irn)2  4 


there  is  an  eigenvector 


v  ■  v  +  e 


of  B  +  PH  satisfying 


I e 


<  —iin _ 

-  6p  -  2 irn 


(5.6) 
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From  (3.6)  we  have 


V  -  v(I  +  PPT)1/2  -  (v  +  e)  (I  +  PPT)1/2 


Hence  from  (5.1) 


yT  -  (v  +  e)T  (I,  P  )YT 


y\  o  o 

0  0 

T 

*  V 

0  y*  0 

T 

+  v  P 

0  Y2  0 

T 

T 

0  0  y^ 

«  CO 
>« 

o 

o 

_ 1 

yj  0  o 

T 

Y^  0  0 

T 

T 

T 

T 

+  e 

o  y2  o 

+  e  P 

o  y2  0 

o  o  y‘ 

0  0  Y? 

J . 

But  the  first  term  in  this  sum  is  the  approximation  (4.2)  to  y.  Hence, 
since  | y^  |  ■  1,  the  final  bound  becomes 


(5.7) 


y  - 


Vl 

V2 

lV3 


<  IT  + 


2irn 


max 


5  -  2irn 

P 


{ I y±  i >  + 


2 

2tt  n 


6  -  2iTrt 

P 


It  is  important  to  note  that  in  deriving  the  bound  (5.7)  it  has 
been  implicitly  assumed  that  the  Perone  root  of  A  was  to  be  found  in 
7  and  that  this  root  corresponded  to  the  Perone  root  of  B.  This  cannot 
be  insured  a  priori  without  making  further  assumptions.  Essentially  what 
is  required  is  that  the  eigenvalues  of  the  C^,  be  sufficiently  removed 
from  the  S.'s  so  that  subsequent  perturbations  cannot  turn  one  of  them 
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into  a  Perone  root.  Although  a  formal  analysis  Is  possible.  It  will  not 
be  given  here,  since  in  practice  it  is  easy  to  see  whether  the  largest 
eigenvalue  of  S’  is  the  Perone  root. 

When  A  is  stochastic,  the  0(  |) E  ])  )  perturbation  in  passing  from 
B  to  B  +  PH  is  critical  to  the  analysis.  This  is  because  all  the 
are  within  0 (  | E  || )  of  unity,  so  that  a  perturbation  of  0(  |E  ||)  will 
completely  scramble  the  eigenvectors. 

For  computational  purposes  we  have  scaled  the  x^  and  y^  so  that 
||xi  ||  *  1  and 

(5.8)  y^xjL  “  1 

In  fact  the  approximation  algorithm  will  yield  the  same  results  for  any 
scaling,  provided  only  that  (5.8)  is  satisfied.  To  see  this,  suppose 
that  x^^  is  replaced  by  xi  =  where  6^  is  a  nonzero  scaling 

A  -1 

factor.  Then  (5.8)  requires  that  y^  be  replaced  by  y^  »  6^  y^.  It 

is  easily  seen  that  this  results  in  B  being  replaced  by  D  ^BD,  where 

D  *  diag(6^,  62*  631)*  Consequently  the  left  eigenvector  of  D  ^BD  is 
T 

v  D  and  the  approximation  to  the  left  eigenvector  of  the  entire  system 

is  ('51v1y^»  62'J2^2’  63V3^  "  (vlyl’  v2y2’  V3yp »  which  is  the  same  as 
(4.2). 

For  stochastic  matrices  there  is  a  natural  scaling  of  the  y^  that 
leads  to  a  beautiful  interpretation  of  the  approximation  process.  Speci¬ 
fically,  let  y^^  be#scaled  so  that 

(5.9)  lTy±  -  1  , 
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i.e.  so  that  y^  can  be  interpreted  as  a  vector  of  probabilities.  Write 

*i  -  1  +  pi 

If  x^  is  given  the  scaling  implied  by  (5.8),  then 
T  T  T,  .  .  _ 

yipi  *  y^i  -  ytl  -  1  -  1  -  0  , 

and 

(5.10)  yiAiipi  "  6iylpi  "  0 

Moreover,  since  is  within  0(  ||E  || )  of  a  stochastic  matrix,  it 

follows  from  Theorems  3.1  and  3.2  that  if  6(8^,  C^)  is  large  enough 
the  vector  p.^  will  satisfy 

(5.11)  pt  -  0(  |E  |)  • 

2 

It  will  now  be  shown  that,  up  to  terms  of  0(  || E  |  ),  the  matrix  B 
in  (4.1)  is  stochastic.  Consider  the  first  row  sum 

S1  +  4 12  +  *13  '  ylAUXl  +  ylE12X2  +  ylE13x3 

•  »I‘4U-1  +  E12-1  +  E13-«  +  ylAllpl  +  yI<E12p2  +  E13*3^ 

-  yJ(Aui  +  e12i  +  E 131)  +  0(  II E  II2) 

by  (5.10)  and  (5.11).  Because  A  is  stochastic  A^l  +  E^l  +  Ei3~  * 
Hence 

8i +  *i2 +  *13  ■  y1.1  +  0(  UE  I2>  - 1  +  °< tE  I2>  * 


2 

by  (5.9).  Thus  the  first  row  sum  of  B  is  within  0(  ||E  ||  )  of  one, 
and  so  are  the  other  row  sums. 

The  nearly  stochastic  matrix  B  —  or  rather  B  which  differs 
2 

from  B  by  0(  ||E  |  )  —  controls  the  long  term  behavior  of  the  Markov 

chain.  To  see  this  note  that  by  an  application  of  Theorem  2.1  the  matrix 

(4.1)  can  be  reduced  to  the  form  diag(B,  C) .  Now  the  behavior  of  the 

k 

Markov  chain  is  controlled  by  the  behavior  of  the  powers  A  (k  *  1,2,3,...), 
and  this  behavior  can  be  determined  by  examining  the  behavior  of  the 
powers  of  diag(B,  C) . 

—  —  ^ 

Specifically,  diag  (B,  C)  ■  diag(B  ,  C  ).  Since  the  eigenvalues  of 

—  —  — 

C  are  less  than  those  of  B,  the  powers  will  approach  diag(B  ,  0). 

— 

Since  B  has  a  dominant  eigenvalue  of  one,  B  will  tend  more  slowly  to 

the  matrix  v  wT,  where  v  and  w  are  the  left  and  right  eigenvectors 

corresponding  to  one.  In  terms  of  the  original  Markov  chain,  if  the  state 
(k) 

vector  y  is  written  in  the  form 


y 


(k) 


v<k>y<k> 
1  yl 


vooyoo 

2  y2 


v 


(k)  (k) 
3  y3 


T  (k)  (k)  — k 

where  1  *  1,  then  the  y^  will  converge  swiftly  as  C  0  and 

(k)  — k 

the  v^  will  converge  more  leisurely  as  B  approaches  its  limit.  This 

justifies  calling  B  the  long  term  transition  matrix  of  the  chain. 

Of  course  this  double  limit  behavior  of  nearly  completely  decomposable 


chains  has  been  remarked  by  numerous  researchers,  beginning  with  Simon 
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and  Ando  [7];  the  approach  taken  here  merely  makes  explicit  the  factors 
that  control  the  rates  of  convergence.  Although  the  matter  will  not  be 
persued  in  this  paper,  it  should  be  possible  to  obtain  numerical  conver¬ 
gence  rates  from  an  analysis  of  the  behavior  of  the  powers  of  the  matrices 
B  and  C^. 

6.  Practical  details 

The  details  of  the  implementation  of  the  approximation  algorithm  and 
the  computation  of  the  bounds  will  depend  on  the  sizes  of  the  matrices  in¬ 
volved.  Three  classes  of  matrices  may  be  distinguished 

1.  Matrices  that  can  be  stored  as  an  array  in  the  high  speed 
memory  of  a  computer.  Typically,  an  upper  bound  for  the 
order  of  such  matrices  ranges  from  fifty  to  five  hundred, 
depending  on  the  computer. 

2.  Matrices  that  cannot  be  stored  in  array  form  but  whose 
structure  permits  the  efficient  solution  of  a  system  of 
linear  equations  with  the  matrix  elements  as  coefficients. 
Examples  of  such  matrices  are  band  matrices  and  "sparse" 
matrices  [4].  Their  orders  can  be  very  large. 

3.  Matrices  that  are  so  large  that  the  only  thing  one  can  do 
with  them  is  to  form  their  product  with  a  vector. 

Each  of  these  classes  will  be  discussed  in  turn. 


Hfcfcte  -’fA, 
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If  the  matrices  lie  in  the  first  class,  the  appropriate  pro¬ 

cedure  is  to  use  the  QR  algorithm  to  reduce  A^  to  quasi-triangular 
form.  Specifically,  there  exists  software  [11]  to  compute  an  ortho¬ 
gonal  matrix  (x^,  Y^) ,  such  that 


<V  V  Aii  <v  V  ■ 


where  C  is  quasi-triangular,  i.e.  block  upper  triangular  with  lxl 

and  2x2  blocks  on  its  diagonal.  Because  is  quasi-triangular  it 

is  extremely  cheap  to  solve  linear  systems  involving  -  31,  which 

means  that  it  is  practical  to  compute  q  in  part  8  of  Theorem  2.1, 

from  which  y^  can  be  calculated. 

The  next  step  is  to  compute  B.  Since  this  requires  a  pass  over 

the  matrices  E  . ,  this  is  also  a  logical  time  to  compute  the  bounds 

n  and  y  in  (3.3).  In  very  large  problems  it  is  unlikely  that  there 

will  be  storage  to  contain  the  matrices  and  Y^,  so  that  H  and 

G  cannot  be  computed  explicitly.  However,  bounds  may  be  obtained  by 

T  T 

the  following  procedure.  For  each  ^ ,  compute  E^x^  and  y^E^ . 
Then  set 


(6.1) 


(6.2) 


1 1  I2  K,*.  II2  -  I  K,*, 

i » j  3  3  i ,  j  3  3 


Y2  ■  III  #  !2  |x,  II2 

i.j  13 


T„  II  2  |i ..  n  2 


l  II  ^iEij  II  I  ^i 

i,j 


A  bound  <J>  on  the  off-diagonal  elements  of  C  will  be  needed  later, 
c 
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It  can  be  calculated  at  this  point  In  the  form 

(6.3)  *  =»  l  ||Y*  I  |E  ||  || X ,  ||  -  l  || E  «  || X.  || 

i»j  3  2  i,j  2  2 

The  application  of  the  perturbation  theorem  requires  a  lower  bound 
6  on  6(B,  C) .  Because  is  quasi-triangular ,  ||  (8  -  C)  ^  jj  can  be 
cheaply  estimated  by  a  variant  of  the  inverse  power  method  [1,6],  Set 
B+  =  max  {&  }  and  8  *  min  {8^}-  Then  set 

(6.4)  6  «*  min  ||  (8+I  -  CL)  1  ||  1  -  (B+  —  &  )  “  <frb—  4»c  » 

where  ^  is  any  upper  bound  on  the  norm  of  the  off-diagonal  part  of  B. 
It  can  easily  be  shown  by  repeated  appeals  to  Theorem  3.1  that  6  is 
indeed  a  lower  bound  for  <5(B,  C)  . 

Having  computed  n,  Y»  and  6,  the  bound  ir  in  (5.2)  can  be 

computed  from  (3.5).  The  matrix  B  may  now  be  analysed  in  the  same 
manner  as  the  [cf.  (5.3)]  to  get  a  bound  on  6  (p ,  R) .  At  this 

point  the  bounds  (5.4)  and  (5.5)  may  also  be  computed.  Finally,  the 
bound  (5.7)  the  accuracy  of  the  approximation  may  be  computed  . 

For  systems  of  large  sparse  matrices,  it  is  possible  to  compute 
right  and  left  eigenvectors  x^  and  y^  oy  means  of  the  inverse  power 
method  [  8  ].  However,  it  will  not  generally  be  possible  to  maintain 

the  matrices  ,  Y^,  and  in  the  high-speed  memory  of  the  computer 

in  question.  Fortunately  the  matrix  (x^,  Y^)  can  be  written  as  a  House¬ 
holder  transformation  in  the  form 


(*1,  Yi)  *  I  -  2wiw^ 
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where  w^  is  a  vector  of  norm  unity  that  can  be  determined  from  x^ 
alone  (for  details  see  [8]).  If  ■  1,  then  it  follows  from 

Theorem  2.1  that  ||x^  ||  ■  |Jy^  ||  .  Hence  n ,  y»  and  $c  may  be  es¬ 
timated  as  in  (6.1),  (6.2),  and  (6.3).  Finally,  since  from  item  7 

T 

in  Theorem  2.1,  it  follows  that  ■  Y^A^Y^  the  separation  6(B,  C) 

may  be  bounded  as  described  above,  but  this  time  using  the  technique  of 

T 

implicit  deflation  [12]  to  solve  systems  involving  Y^AY^  “  SI*  The 

rest  of  the  bounding  process  proceeds  as  above. 

For  matrices  of  the  third  class,  i.e.  matrices  for  which  only  matrix- 

vector  multiplication  is  possible,  the  eigenvectors  x^^  and  y  must  be 

computed  by  the  power  method  or  its  variants  [5,  10].  The  numbers  n  and 

Y  may  be  computed  as  described  above.  Unfortunately,  there  is  no  way  of 

computing  a  lower  bound  on  <5(B,  C)  since  there  is  no  way  of  solving  linear 

T 

systems  involving  *  Y^Y^.  The  best  that  can  be  done  is  to  compute  an 
approximate  upper  bound  that  may  be  a  reasonable  estimate.  From  item  8 
in  Theorem  2.1  we  have 

ilq  1  <  |(ct  -  s^)'1 1|  II YIAiiXi  3  . 

and  from  item  12 , 


Hence,  in  analogy  with  (6.4),  it  may  be  hoped  that  the  number 
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6 


min 


(  |yt  II2  -  1)1/2 


-  4> 


C 


will  not  differ  too  much  from  S(B,  C) . 


6 .  A  numerical  example . 

In  this  section  the  techniques  of  this  paper  will  be  applied  to  a 
matrix  analysed  previously  by  Courtois.  The  matrix  with  its  partitioning 
is  given  in  Table  1.* 

Table  2  gives  the  details  of  the  calculation  of  the  approximation 
y  in  (4.2).  The  vector  is  compared  with  the  true  vector  y,  which 
has  been  scaled  so  that  || y  -  y  ||  is  a  minimum.  Table  3  gives  the  de¬ 
tails  of  the  computation  of  the  error  bound  (5.7). 

The  error  bound  is  good  enough  for  practical  purposes,  even  though 
it  is  an  order  of  magnitude  bigger  than  the  actual  error.  This  overesti¬ 
mate  is  in  part  due  to  the  repeated  use  of  norm  inequalities  in  the  deri¬ 
vation  of  the  bound.  It  is  also  due  to  the  fact  that  Theorem  3.2  bounds 
Jp  Jy,  whereas  it  is  clear  that  the  smaller  spectral  norm  could  be  used 
in  the  derivation  of  (5.7).  As  was  pointed  out  in  the  introduction,  a 
pessimistic  view  of  the  error  is  inevitable  when  on  attempts  to  bound  it 
rather  than  estimate  it. 
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The  matrix,  as  reported  in  [2,3]  is  not  stochastic;  the  sixth  row  does 
not  sum  to  one.  Since  the  property  of  being  stochastic  is  not  necessary 
to  the  procedure  described  in  this  paper,  the  matrix  is  left  as  reported. 


0.00000  0.00005  0.00000  0.00000  0.00005  0.19990  0.25000  0.55000 


Table  2 


Computation  of  y  (4.2) 


8 


i 


x 


1 


0,9991  0.9993  0.9993 
0.5772  0.6954  0.7074  0.8080  0.5774  0 
0.5773  0.7218  0.7069  0.6061  0.5774  0 
0.5776  0.3150  0.5774  0 


B 


0.9991 

0.0010 

0.0001 

0.0005 

0.9993 

0.0001 

0.0002 

0.0001 

0.9999 

0.4433 

0.6130  0.6539 

y 

y 

0.308281 

0.307971 

0.320022 

0.320321 

0.139653 

0.139726 

0.495361 

0.495323 

0.371565 

0.371616 

0.272669 

0.272734 

0.629330 

0.629282 

0.230652 

0.230659 

l|y  -  y  II 

•  10“' 4 


.4170 

.9624 

.3527 


5 


Table  3 


Computation  of  the  Error  Bound 


Norms  of  Off-diagonal  Matrices 

n  «  1 H  I]  F  «  4.54  •  10-4 
Y  -  || G  1|F  -  2.14  •  10"4 
$  «  9.76  •  10“4 

b 

<f>  =*  9.29  •  10"4 

c 

Computation  of  S  (6.4)  and  ir  (3.5) 

6+  -  0.9999  S_  »  0.9991 

||  (6+  -  C  )-1  If1  =-  0.2352,  0.6992,  0.4487 
6  -  0.2325 

t  -  0.0013  [from  (3.4)] 

it  -  9.22  •  10'4 


Final  error  bound 

6  -  2.81  •  10"4 
P 

| r  |  -  3.095  [cf  (5.3)] 

x  -  0.0017 
j| e  |  <  2.99  •  10-3 
II y  -  y  II  <  4.23  •  10 


[cf  (5.6)] 


